15 research outputs found

    Locality-Sensitive Hashing Does Not Guarantee Privacy! Attacks on Google's FLoC and the MinHash Hierarchy System

    Full text link
    Recently proposed systems aim at achieving privacy using locality-sensitive hashing. We show how these approaches fail by presenting attacks against two such systems: Google's FLoC proposal for privacy-preserving targeted advertising and the MinHash Hierarchy, a system for processing mobile users' traffic behavior in a privacy-preserving way. Our attacks refute the pre-image resistance, anonymity, and privacy guarantees claimed for these systems. In the case of FLoC, we show how to deanonymize users using Sybil attacks and to reconstruct 10% or more of the browsing history for 30% of its users using Generative Adversarial Networks. We achieve this only analyzing the hashes used by FLoC. For MinHash, we precisely identify the movement of a subset of individuals and, on average, we can limit users' movement to just 10% of the possible geographic area, again using just the hashes. In addition, we refute their differential privacy claims.Comment: 14 pages, 9 figures submitted to PETS 202

    S-GBDT: Frugal Differentially Private Gradient Boosting Decision Trees

    Full text link
    Privacy-preserving learning of gradient boosting decision trees (GBDT) has the potential for strong utility-privacy tradeoffs for tabular data, such as census data or medical meta data: classical GBDT learners can extract non-linear patterns from small sized datasets. The state-of-the-art notion for provable privacy-properties is differential privacy, which requires that the impact of single data points is limited and deniable. We introduce a novel differentially private GBDT learner and utilize four main techniques to improve the utility-privacy tradeoff. (1) We use an improved noise scaling approach with tighter accounting of privacy leakage of a decision tree leaf compared to prior work, resulting in noise that in expectation scales with O(1/n)O(1/n), for nn data points. (2) We integrate individual R\'enyi filters to our method to learn from data points that have been underutilized during an iterative training process, which -- potentially of independent interest -- results in a natural yet effective insight to learning streams of non-i.i.d. data. (3) We incorporate the concept of random decision tree splits to concentrate privacy budget on learning leaves. (4) We deploy subsampling for privacy amplification. Our evaluation shows for the Abalone dataset (<4k<4k training data points) a R2R^2-score of 0.390.39 for ε=0.15\varepsilon=0.15, which the closest prior work only achieved for ε=10.0\varepsilon=10.0. On the Adult dataset (50k50k training data points) we achieve test error of 18.7 %18.7\,\% for ε=0.07\varepsilon=0.07 which the closest prior work only achieved for ε=1.0\varepsilon=1.0. For the Abalone dataset for ε=0.54\varepsilon=0.54 we achieve R2R^2-score of 0.470.47 which is very close to the R2R^2-score of 0.540.54 for the nonprivate version of GBDT. For the Adult dataset for ε=0.54\varepsilon=0.54 we achieve test error 17.1 %17.1\,\% which is very close to the test error 13.7 %13.7\,\% of the nonprivate version of GBDT.Comment: The first two authors equally contributed to this wor

    Efficient and Extensible Policy Mining for Relationship-Based Access Control

    Full text link
    Relationship-based access control (ReBAC) is a flexible and expressive framework that allows policies to be expressed in terms of chains of relationship between entities as well as attributes of entities. ReBAC policy mining algorithms have a potential to significantly reduce the cost of migration from legacy access control systems to ReBAC, by partially automating the development of a ReBAC policy. Existing ReBAC policy mining algorithms support a policy language with a limited set of operators; this limits their applicability. This paper presents a ReBAC policy mining algorithm designed to be both (1) easily extensible (to support additional policy language features) and (2) scalable. The algorithm is based on Bui et al.'s evolutionary algorithm for ReBAC policy mining algorithm. First, we simplify their algorithm, in order to make it easier to extend and provide a methodology that extends it to handle new policy language features. However, extending the policy language increases the search space of candidate policies explored by the evolutionary algorithm, thus causes longer running time and/or worse results. To address the problem, we enhance the algorithm with a feature selection phase. The enhancement utilizes a neural network to identify useful features. We use the result of feature selection to reduce the evolutionary algorithm's search space. The new algorithm is easy to extend and, as shown by our experiments, is more efficient and produces better policies

    Transitive primal infon logic: The propositional case, Microsoft Research

    No full text
    Abstract Primal (propositional) logic PL is the {∧, →} fragment of intuitionistic logic, and primal (propositional) infon logic PIL is a conservative extension of PL with the quotation construct p said. Logic PIL was introduced by Gurevich and Neeman in 2009 in connection with the DKAL project. The derivation problem for PIL (and therefore for PL) is solvable in linear time, and yet PIL allows one to express many common access control scenarios. The most obvious limitations on the expressivity of logics PL and PIL are the failures of the transitivity rules pref x → z respectively where pref ranges over quotation prefixes p said q said . . .. Here we investigate the extension T of PL with an axiom x → x and the inference rule (trans0) as well as the extension qT of PIL with an axiom pref x → x and the inference rule (trans). • [Subformula property] T has the subformula property: if Γ y then there is a derivation of y from Γ comprising only subformulas of Γ ∪ {y}. qT has a similar locality property. • [Complexity] The derivation problems for T and qT are solvable in quadratic time. • [Soundness and completeness] We define Kripke models for qT (resp. T) and show that the semantics is sound and complete. • [Small models] T has the one-element-model property: if Γ y then there is a one-element counterexample. Similarly small (though not one-element) counterexamples exist for qT

    Basic primal infon logic

    No full text

    Locality-Sensitive Hashing Does Not Guarantee Privacy! Attacks on Google's FLoC and the MinHash Hierarchy System

    No full text
    Recently proposed systems aim at achieving privacy using locality-sensitive hashing. We show how these approaches fail by presenting attacks against two such systems: Google’s FLoC proposal for privacy-preserving targeted advertising and the MinHash Hierarchy, a system for processing location trajectories in a privacy-preserving way. Our attacks refute the pre-image resistance, anonymity, and privacy guarantees claimed for these systems. In the case of FLoC, we show how to deanonymize users using Sybil attacks and to reconstruct 10% or more of the browsing history for 30% of its users using Generative Adversarial Networks. We achieve this only analyzing the hashes used by FLoC. For MinHash, we precisely identify the location trajectory of a subset of individuals and, on average, we can limit users’ trajectory to just 10% of the possible geographic area, again using just the hashes. In addition, we refute their differential privacy claims.ISSN:2299-098

    Automating Cookie Consent and GDPR Violation Detection

    No full text
    The European Union’s General Data Protection Regulation (GDPR) requires websites to inform users about personal data collection and request consent for cookies. Yet the majority of websites do not give users any choices, and others attempt to deceive them into accepting all cookies. We document the severity of this situation through an analysis of potential GDPR violations in cookie banners in almost 30k websites. We identify six novel violation types, such as incorrect category assignments and misleading expiration times, and we find at least one potential violation in a surprising 94.7% of the analyzed websites. We address this issue by giving users the power to protect their privacy. We develop a browser extension, called CookieBlock, that uses machine learning to enforce GDPR cookie consent at the client. It automatically categorizes cookies by usage purpose using only the information provided in the cookie itself. At a mean validation accuracy of 84.4%, our model attains a prediction quality competitive with expert knowledge in the field. Additionally, our approach differs from prior work by not relying on the cooperation of websites themselves. We empirically evaluate CookieBlock on a set of 100 randomly sampled websites, on which it filters roughly 90% of the privacy-invasive cookies without significantly impairing website functionality
    corecore